58 research outputs found
WebMonitoring software system: Finite state machines for monitoring the web
This paper presents a software system called WebMonitoring. The system is designed for solving certain problems in the process of information search on the web. The first problem is improving entering of queries at search engines and enabling more complex searches than keyword-based ones. The second problem is providing access to web page content that is inaccessible by common search engines due to search engine’s crawling limitations or time difference between the moment a web page is set up on the Internet and the moment the crawler finds it. The architecture of the WebMonitoring system relies upon finite state machines and the concept of monitoring the web. We present the system’s architecture and usage. Some modules were originally developed for the purpose of the WebMonitoring system, and some rely on UNITEX, linguistically oriented software system. We hereby evaluate the WebMonitoring system and give directions for further development
La traduction des noms propres : une étude en corpus
Dans cet article, nous abordons le problème de la traduction des noms propres. Nous présentons notre hypothèse, selon laquelle la thèse très répandue de la non-traductibilité des noms propres peut être contredite. Puis, nous décrivons la construction du corpus multilingue aligné que nous utilisons pour illustrer notre propos. Nous évaluons enfin les apports et les limites de ce corpus dans le cadre de notre étude.The translation of proper names : a corpus study In this paper, we tackle the problem of the translation of proper names. We introduce our hypothesis according to which proper names can be translated more often than most people seem to think. Then, we describe the construction of a parallel multilingual corpus used to illustrate our point. We eventually evaluate both the advantages and limits of this corpus in our study
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular
Language. Scanning and character recognition were a particular challenge, since various non-standard
character set encoding was used in the course of the almost 60-year long production of the dictionary. The first
aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized
text of and transform it into structured data stored in relational lexical database. This approach is compatible
with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database model
was designed in compliance with these structured forms, following mostly the lemon model. Mapping of
the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A
software solution for the dictionary text analysis, parsing and lexical database population was developed and
tested on the first and the last published volumes of the dictionary (which contain 27,141 articles in total). An
evaluation of the results shows that the developed model and software solution can be successfully used for
the other volumes as well
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
In this paper we present a model for selection of good dictionary examples for Serbian and the
development of initial model components. The method used is based on a thorough analysis of
various lexical and syntactic features in a corpus compiled of examples from the five digitized
volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary. The initial set of
features was inspired by a similar approach for other languages. The feature distribution of
examples from this corpus is compared with the feature distribution of sentence samples
extracted from corpora comprising various texts. The analysis showed that there is a group of
features which are strong indicators that a sentence should not be used as an example. The
remaining features, including detection of non-standard and other marked lexis from the SASA
dictionary, are used for ranking. The selected candidate examples, represented as featurevectors,
are used with the GDEX ranking tool for Serbian candidate examples and a supervised
machine learning model for classification on standard and non-standard Serbian sentences, for
further integration into a solution for present and future dictionary production projects
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
Multiword expressions: Insights from a multi-lingual perspective
Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar
- …